Greedy Approach for Low-Rank matrix recovery 



A. Petukhov^, I. Kozlov^ 

^Contact Author, Department of Mathematics, University of Georgia, Athens, GA 30602, USA, 

petukhov @ math.uga.edu 
^Algosoft Tech USA, Bishop, GA, USA, inna@aIgosoft-tech.com 

for IPCV'13 



cn : 

o 
<^ . 

u . 

Oh' 

in : 



< 



> 



o 



% 



Abstract — We describe the Simple Greedy Matrix Comple- 
tion Algorithm providing an efficient method for restoration 
of low-rank matrices from incomplete corrupted entries. 

We provide numerical evidences that, even in the simplest 
implementation, the greedy approach may increase the re- 
covery capability of existing algorithms significantly. 
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1. Introduction 

We consider a greedy strategy based algorithm for the 
recovery of the low-rank matrix from incomplete corrupted 
samples. 

The problem of low-rank matrix completion is not new. 
However, it got a new impulse (|4|, |2|) in connection 
with the development of the compressed sensing theory 
and algorithms and ideas to use the l^ minimization as 
a surrogate for the sparsest solution (|^, [6], [,12] ) . 

This paper can be considered as a feasibility study for 
the methods inspired by ideas from both low-rank matrix 
completion and our compressed sensing oriented £^ -greedy 
algorithm (Q, EOl, KHI)- 

The problem is set as follows. It is required to restore 
(complete) the matrix A G M™^" of rank r, r < min{TO, n}, 
given by its k entries, k < nm. The set of the given entries 
is rj C {1, . . . ,m} X {1, . . . ,n}. \n\ is the cardinality of O 
(which in our case is equal to k). We also introduce notation 
d{fl) for the density of the set il, d{n) :— \il\/{nni). 
The complementary set fj is a set of erasures, 1 — d{il) 
represents the density of erasures. The theoretical bounds 
for recoverability of the matrix depends not only on the 
density of the samples but also on the matrix A and on the 
2D-geometry of fl. The matrix consisting of only one non- 
zero entry is the simplest example of a rank 1 matrix which 
can be restored only if the value at the non-zero entry is 
known. Anyway, it turned out (cf. [2|) that under quite mild 
conditions random matrices of size nx n and rank r can be 
recovered from at most 0{rn^'^logn) entries as a matrix 
with the minimum of nuclear norm. 

The popularity of that problem can be explained by 
an enormous number of applied problems which can be 
formulated in terms of matrix completion. Among many 



settings in different applied areas, we mention problems re- 
lated to image processing. Image inpainting, including more 
particular image upsampling, face recognition technique, 
motion tracking and segmentation in video are most typical 
of those problems. While the problem of low-rank matrix 
completion was studied long ago, the theory got a big push 
due to development of Compressed Sensing / Compressive 
Sampling (CS) technique. After some simplification, the CS 
data decoding goal can be reduced to solving underdeter- 
mined systems 

Ax = y + e, (1) 

where x G M" is a sparse vector of data "encoded" with the 
known to the decoder matrix A G M™^", m < ti, y G M™ 
is a vector of measurements (of x) possibly corrupted by the 
vector e G M". Here and bellow we assume that the sparse 
solution X exists and the vector of errors e is also sparse. 
The sparsity of a G M^ means that 

|a|o := |{a» ^ 0}| < TV. 

The value |a|o is called the Hamming weight of the vector 
a. Since the problem of finding sparse solutions has non- 
polynomial complexity (lH)), the mainstream CS researches 
suggested to use to replace the minimization of |x|o (or 
|x|o+|e|o) with the minimization of £^-norm. It turned out 
that that such approach based on convex optimization gives 
the optimal sparse solution at least when |x|o is not very 
large (cf. [6J, \3\, lfT2l for the case e — 0). Thus, in 
some special cases the original non-convex problem can be 
reduced to convex programming. In what follows, like for 
the notion for the Hamming norm, we use notation | • \p, 

< p < oo, for element-wise (quasi-)norms of vectors an 

/ \ i/p 

matrices. Say, for the matrix A, \A\p := (^^ ■ |Aij|P J 

In particular, | • I2 is the Frobenius norm. The inner product 

of 2 matrices A and B is defined as {A, B) :— \i?LCt{A^ B). 

Thus, (A, A) — \A\2. The notation || • ||p is reserved for the 

operator norms of matrices. 

CS results inspired the authors of |l2] and l|4] on replacing 

the minimum rank condition leading to non-polynomial 

complexity with the minimization of the nuclear norm 

II ^11* ■= ^^i of the matrix A, where ai are singular values 



of A. To be more precise, the problem 

\\A\\^, — !■ min subject to Ajj = Alij, {hj) £ ^, 

where M, j are the known entries (measurements) of the ma- 
trix A, is considered as relaxation of the rank minimization 
problem above. 

Many different settings giving a solution of the original 
problems have been studied for the last years. In most cases, 
the intention of those studies was to find the faster algorithms 
with the higher capability of the recovery. Typically, modifi- 
cations of the problem leading to unconstrained optimization 
were introduced. 

2. Basic Algorithm 

For our experiments we need the algorithm providing 
convex minimization recovering low-rank matrices from 
incomplete corrupted samples. It is used as a basic con- 
structive block in our algorithm. The problem of restoring 
matrix from corrupted entries is less studied than the simpler 
matrix completion problem when the available entries are not 
corrupted. Anyway, there are a few computationally efficient 
algorithms solving that problem (e.g., ||5|, (ISl, lfT4l ). 

For our purposes, we selected the algorithm from [8| 
based on the method of Augmented Lagrange Multipliers 
(ALM) (e.g., |1|). Having corrupted samples, instead of 
finding the matrix with the sparsest set of singular values 
coinciding with the measurements on as large as possible 
set, the algorithm finds the minimum of the functional 

L{A,E,Y,^l) := \\A\U + X\E\i + {Y,R) + ^\R\l 

where R — M — A — E is the residual of approximation of 
the measurements M of the estimate unknown matrix matrix 
A and the estimate of the unknown matrix of errors E. The 
entries of the input matrix AI on the ft are unknown. It is 
assumed that R vanishes on Cl and does not contribute into 
the third and the forth term as well as E does not contribute 
into the second term. 

If it is known that the observed entries in M are not cor- 
rupted, the second term can be omitted. However, we assume 
that we never know whether the entries are corrupted. So, 
in what follows, we minimize the 4-term functional. 

We will need the following notation 
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X > e, 
X < —e, 
otherwise; 

where x can be either a number or a vector or a matrix. 
The operator S^ is called the shrinkage operator. The norms 
II ■ II 2, II • I loo applied to matrices mean the operator norms. 

The minimization algorithm, as it is described in fSl and 
implemented in Matlab code, is as follows 

Algorithm ALM. 

Input. Observation matrix M G M'"^", defined on fl, and 
A>0. 



lyfe). 



Initialization. ¥"> = ,„ax{ || M|b', || A/|| 

^0 ^ 0, ^0 > 0, p > 1, fc = 0; 

1 . while not converged do 

2. ([/, S, V) := svd(Af ~ E^ + /i" 

3. A'^+i ■.= US^-i[S]V^; 

4. E''+^ := S^^-i [M - A''+^ + ^i-^Y'']; 

5. r'-^+i := r^ + ^fe(A/ - A*^+i - E''+^) 

6. Hk+i := P^J■k, fc ;= fc + 1; 

7. end while. 
Output. A''+^, £''=+1. 



3. Our algorithm 

Our modification of the algorithm above is inspired by 
significant success reached by applying greedy ideas to 
solving underdetermined systems (Q, ifTOl . ifTTI ). The gen- 
eral greedy strategy in optimization algorithms consists in 
sequential finding a simple suboptimal solutions giving some 
information about the optimal solution. A greedy algorithm 
picks up some most obvious features or elements of those 
solutions and gives them a privilege to be pivot for the next 
iteration of the suboptimal algorithm. Each iteration brings 
new pivot elements. 

In the matrix recovery algorithm, the erasures from the 
set fj forms such group from the beginning. Whereas, the 
elements of il are just suspicious to be erroneous. If we 
have sufficient evidence that some element in fl contains 
a random error independent of the content of other entries 
from n, that the decision to move this element to Ct is quite 
justifiable. While the independence condition is not always 
accurate even in our experiments with artificially generated 
data, we use this strategy for estimation of the capability of 
the greedy ideas for matrix recovery. 

Our greedy algorithm consists in iterating with an updated 
(dilated) sets ftk- We will call it the Simple Greedy Matrix 
Completion Algorithm (SGMCA). Generally speaking , any 
matrix recovery algorithm, including SGMCA itself, which 
is able to fight the mixture of erasures and errors can be 
used as a basic block of SGMCA. 

Formally, all our experiments can be described in the 
following way. 

Algorithm SGMCA. 

Input. M, n. Initialization. A > 1, fio = ^, ^" -^ M, 
E" ■.^0,0<q <1, k^O. 

1. Set fc := fc + 1; 

2. A'' = ALM{M, nk); 

3. if fc == 1 then Ti = 0.3max,j{|Ai^. - A/y |; else Tk := 
OMTk-u 



4. fife+i=fifc\{(«,J 

5. if not converged go to 2. 



Ak 



M,,\ >Tk}; 



4. Numerical Experiments 

Since our intention was to conduct algorithm feasibility 
study, the goal of this section is to give comparison with 



the output of recently published algorithms and with pure 
ALM (one iteration of the algorithm above with no update). 
The parameters in the basic algorithm are selected as /^o = 
0.3/||A-f II2, p = l.l + 0.5\n\/{mn). The parameter A is 
defined by the combination of d{il) and the density of errors 
in n samples. The general trend can be characterized as 
follows: the higher error rate, the less value of A has to 
be used. In what follows, we do not use fine tuning of A. 
The same A is used for big groups of experiments. At the 
same time, tuning A may bring significant increase of the 
algorithm efficiency. In this paper, we use values of A in the 
range 0.02 ^ 100 

For our experiments, we used Matlab implementation 
of ALM algorithm available at are available at 
|http : //perception . csl . Illinois . ed u 
/matrix-rank/home .html We used the code for 
Matrix Completion via the inexact ALM Method with our 
adaptation to the input with errors. 

In all our experiments, A are square ray.n,m. = n matri- 
ces of rank r obtained as A = UV^ , where U,V & M"^''. 
The matrices U and V consist of independent gaussian 
random values with zero expectation and the variance 1 . The 
coordinates of erasures were selected randomly. The models 
of errors below were different for different experiments. 

In the first experiment (Fig.l), we demonstrate advantage 
of the iterative SGMCA over ALM (one iteration of the 
same algorithm). For the matrix with fixed sizes m x n, 
m = n, and the rank of matrices r = 15. The solid curves 
on Fig. 1 correspond to SGMCA (up to 10 iterations) for 
n = 128, 512, 1024 (from the bottom to the top). The 
corresponding graphs for ALM algorithm are plotted with 
dashed curves. 
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Fig.l. Greedy iterations vs. convex ALM optimization 

The horizontal coordinate indicates the fraction of the 
matrix available for restoration (i.e., d{n)), while the vertical 
coordinate is the fraction of randomly corrupted entries in 
il. The magnitude of the corruption is randomly set from 



the standard normal distribution. The curves define "phase 
transition" bounds. In our experiments, we run 10 trials. At 
the points of curves all 10 attempts were accomplished with 
success, i.e., for the obtained estimate A, \A — A\2/\A\2 < 
10^"^, whereas for the points above the curves, at least one 
attempt failed. It is assumed that the regions under the curves 
are regions of "success". 

The second experiment (Fig. 2 and Fig. 3) is devoted to 
comparison with the results from [5]. Unfortunately, we do 
not have full information about the error model. So we use 
the same additive model of errors as above. While all other 
parameters are taken from [5|. The matrix of rank 2 is 
constructed as above. Its size changes from 100 to 3000. 

The experiment consists of 2 parts. The first plot (Fig.2) 
contains the curves for the fixed erasure rate 0.1, i.e, 
d{il) = 0.9. However, the probability of errors in those 
entries varies. There are 3 graphs on Fig. 2. The solid line 
corresponds to 10 iterations of SGMCA, the dashed line 
corresponds to ALM, and the dotted curve corresponds to 
the result from (]5]. The values defined by curves give the 
maximum error probability admitting successful correction 
by the corresponding algorithm. If we were aware of the 
error model from |]2, the dashed and dotted curves have to 
coincide up to statistical discrepancy. 

On the second plot (Fig. 3) the error rate is fixed and equal 
to 0.1. The graphs show dependence of maximum possible 
rate of erasures from the size of matrix. 
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Fig.2. Admissible error rate for erasure rate 0.1. 

The efficiency of the algorithms is defined by the distance 
of curve values on Fig. 2 and Fig. 3 to 1. It is easy to 
see that, when the error rate is fixed, SGMCA curve is 
twice closer to 1 than ALM. Hence, SGMCA restores low 
rank matrices from only half of entries necessary for the 
ALM minimization. For the fixes erasure rate, the number 
of uncorrupted entries for SGMCA successful restoration can 
be only one third of that necessary for recovery with ALM. 
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Fig.3. Admissible erasure rate for error rate 0.1. 
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Fig.4. SGMCA vs. RTRMC. r = 5. 



The more precise value of the SGMCA graph at m — 
3000 on Fig 3 is 0.986, i.e., having error probability 0.1, the 
matrix can be restored from 1.4% of random entries. 

In our last experiment (Fig. 4-6), we compare the output 
of SGMCA with the results of RTRMC algorithm from 
|[l4l providing very impressive recovery. However, for such 
successful recovery it requires a priori knowledge of the 
rank of the matrix A. The ALM-based algorithm used as a 
basic algorithm in SGMCA does not require any knowledge 
about the rank while the rank knowledge is useful for it. To 
provide equal opportunities we applied the internal fixation 
of the rank inside ALM procedure. 

The results for ranks r = 5, 15, 25 are given on Fig. 
4-6 correspondingly. The size of matrices is 512 x 512. 
The horizontal coordinate is d{il), whereas the the vertical 
coordinate is the probability of errors in the coefficients 
available for reconstruction. In most of cases, SGMCA out- 
performs RTRMC by 15-25% in the maximum admissible 
probability of errors. The reason why SGMCA loses on 
interval [0,0.175] on Fig.4 is the parameter A = 0.02 
which was fixed for all 3 experiments. Setting A = 0.2 on 
that interval, we would get the overwhelming advantage of 
SGMCA. We emphasize that when the rank is known in 
advance, the optimal parameter A can be computed for each 
d{n). This would not contradict the equal opportunity of 
the two algorithms. Optimal selection of A is a reserve not 
used in our experiments. 

The model of data in |T?| is identical to the model 
described above. Whereas, the error model is different. 
The values of the corrupted values are randomly uniformly 
distributed between minimum and maximum of uncorrupted 
values. We also use that error model in our experiment. 
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Fig.5. SGMCA vs. RTRMC. r = 15. 
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Fig.6. SGMCA vs. RTRMC. r = 25. 
The dashed line corresponding to RTRMC is shorter than 



our solid line since we used the data directly from llT4l . 

5. Future Studies 

This study shows that even the simplest implementation 
of the greedy idea in the form of SGMCA outperforms the 
recent algorithms significantly in the recovering ability for 
very incomplete measurements with high level of corruption. 
The results above show the feasibility of the idea, its 
perspectives and a high level of expectation. 

Because of its iterative nature, the algorithm has to repeat 
basic step a few times. We restricted a number of iterations 
by 10 however in most of cases 5 iterations provided neces- 
sary precision. Among possible directions for improvement, 
acceleration ways are needed to be considered. One of 
possible ways is to do not wait for the completion of each 
iteration, updating il within internal iterations of the basic 
algorithm. In the simplest, but maybe in less efficient form, it 
can be done even with no intrusion into the basic algorithm. 
For instance, when the greedy step looks for coordinates 
of large errors, we do not need high precision output of 
the basic algorithm. So an update of the precision on each 
iteration from low to high may accelerate the algorithm for 
the data close to the limits of the potential recovering ability 
(phase transition points). However, it should be noted that 
this modification may slightly slow down the recovery of 
the data located far from the phase transition points. Other 
way for acceleration skipped by us in this paper is to use the 
estimate of A on the previous iteration as a basic algorithm 
start point for the next iteration. 

Now we discuss the ways for increasing the capability 
SGMCA in the matrix recovery. 

First of all, in our experiments, we practically did not use 
fine tuning of the weight A. We used fixed A for big range 
of the parameters of input data. Whereas, just by replacing 
the value 0.02 with the value 0.2, the results presented on 
Fig. 4-6 can be significantly improved for the high level of 
erasures (on the left side of the graphs). Indeed, when the 
rank of A and the model of errors is known, the optimal 
values of A admitting the maximum density of error can be 
found in advance for any number of erasures and used in 
the recovery procedure. 

When the rank is unknown or nature of possible corruption 
is unknown in advance, adaptive finding A becomes a chal- 
lenging problem. This problem has many common features 
with the problem of finding sparse solutions of underdeter- 
mined linear systems with corrupted input. In the mentioned 
problem, the weight A is defined by the interaction between 
the sparsity of the possible solution and the error vector In 
the problem of of matrix completion, the low rank plays a 
role of the solution sparsity in CS. So the methods (or at least 
principles) developed in CS can be applied to the matrix 
completion problem. In [11], we showed that the sparsity 
of the solution can be reliably estimated on the dynamic 
of change of the value |x|o.5/|x|2. The same characteristic 



of the matrix A intermediate estimates can be used for 
the matrix completion procedure. We also have to say that, 
generally speaking, for finding A we do not need both the 
rank and the sparsity of errors estimates. Indeed, if we have 
the sparsity of errors, we can evaluate the potential maximum 
rank r admitting recovery with the given algorithm provided 
that A is optimal. Then that optimal A will also provide 
recovery of matrices with the rank less than r. Thus, the 
adaptive A is one of quite realistic sources for increasing 
algorithm capability. 

6. Conclusion 

The paper presents feasibility study for the Simple Greedy 
Matrix Completion algorithm. We showed that it outper- 
forms recently developed algorithms of matrix completion 
significantly. We also discussed the ways for further increase 
SGMCA efficiency. 
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